-
Notifications
You must be signed in to change notification settings - Fork 121
added dremio data_type_list #843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 @GuyEshdat |
WalkthroughAdded a new macro dremio__data_type_list(data_type) that returns Dremio-compatible type lists for string, numeric, timestamp, and boolean categories, or an empty list otherwise. Minor whitespace adjustments made around macro boundaries. No other functional changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (1)
macros/utils/data_types/data_type_list.sql(2 hunks)
🔇 Additional comments (2)
macros/utils/data_types/data_type_list.sql (2)
132-132: Whitespace-only change; no action needed.
Non-functional whitespace change after trino__data_type_list. Safe to leave as-is.
154-154: Whitespace-only change; no action needed.
Non-functional whitespace change after clickhouse__data_type_list endmacro. Safe to leave as-is.
| {% macro dremio__data_type_list(data_type) %} | ||
| {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %} | ||
| {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %} | ||
| {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %} | ||
| {% set boolean_list = ['BOOLEAN', 'BIT'] | list %} | ||
|
|
||
| {%- if data_type == 'string' %} | ||
| {{ return(string_list) }} | ||
| {%- elif data_type == 'numeric' %} | ||
| {{ return(numeric_list) }} | ||
| {%- elif data_type == 'timestamp' %} | ||
| {{ return(timestamp_list) }} | ||
| {%- elif data_type == "boolean" %} | ||
| {{ return(boolean_list) }} | ||
| {%- else %} | ||
| {{ return([]) }} | ||
| {%- endif %} | ||
|
|
||
| {% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Dremio type list added — verify aliases and TZ types; consider adding REAL/BOOL
Good addition and consistent with the dispatch pattern. A couple of points to tighten correctness vs Dremio’s reported data_type values:
- Add aliases: REAL (numeric) and BOOL (boolean) are commonly seen; consider including them.
- Verify whether BIT is actually returned by Dremio INFORMATION_SCHEMA vs upstream sources; it might be safer to include BOOL and keep BIT only if confirmed.
- Verify support for TIME WITH TIME ZONE and TIMESTAMP WITH TIME ZONE in Dremio; if unsupported, they should be removed to avoid false-positive matches.
Proposed minimal improvement (adds REAL and BOOL; leave BIT and TZ types pending verification):
{% macro dremio__data_type_list(data_type) %}
- {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
- {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
- {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
- {% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
+ {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
+ {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','REAL','SMALLINT','TINYINT'] | list %}
+ {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
+ {% set boolean_list = ['BOOLEAN', 'BOOL', 'BIT'] | list %}Additionally, if Dremio reports fixed-length CHAR/CHARACTER in INFORMATION_SCHEMA for some sources, consider including:
- string_list: add 'CHAR', 'CHARACTER'
Please confirm with Dremio docs and/or INFORMATION_SCHEMA outputs before finalizing removals.
To validate support and normalize to actual INFORMATION_SCHEMA values, please check Dremio docs for supported data types and the canonical strings reported in INFORMATION_SCHEMA:
🌐 Web query:
Dremio SQL data types and INFORMATION_SCHEMA data_type values. Specifically: does Dremio support SMALLINT, TINYINT, REAL, BOOL, BIT, TIME WITH TIME ZONE, and TIMESTAMP WITH TIME ZONE, and what are the exact canonical type names returned?
💡 Result:
Dremio does not natively support canonical SQL types SMALLINT, TINYINT, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE; it maps them to other types. The canonical type names returned in INFORMATION_SCHEMA and Dremio metadata for supported types are as follows:
Support for Specific Types & Canonical Type Names
Supported by Dremio
- BIT: Mapped to
BOOLEAN[7][1]. - BOOL: Returned as
BOOLEAN[7][1]. - REAL: Returned as
FLOAT(single-precision float)[1][3][4][7]. - SMALLINT: Mapped to
INTEGER[1][6][7][9]. - TINYINT: Mapped to
INTEGER[1][6][9]. - TIME: Returned as
TIME(without time zone)[1][6][7]. - TIMESTAMP: Returned as
TIMESTAMP(without time zone)[1][6][7]. - TIMESTAMP WITH TIME ZONE: Mapped to
TIMESTAMPorTIMESTAMP(without TZ information—Dremio ignores/strips the time zone part)[6][7].
Not Supported as Canonical Types
- TIME WITH TIME ZONE: Mapped to
TIME(no TZ)[6][7]. - TIMESTAMP WITH TIME ZONE: Mapped to
TIMESTAMP(no TZ)[6][7]. - SMALLINT/TINYINT: Returned as
INTEGER(not as "SMALLINT" or "TINYINT")[1][6][7][9].
Exact Canonical Type Names
The values Dremio reports in metadata (including INFORMATION_SCHEMA.COLUMNS) are the canonical type names listed below (case-sensitive, all uppercase):
- INTEGER
- BIGINT
- FLOAT
- DOUBLE
- BOOLEAN
- DECIMAL
- VARBINARY
- VARCHAR
- DATE
- TIME
- TIMESTAMP
- INTERVAL (day to seconds)
- INTERVAL (years to months)
These are the types you will observe when querying for column data types in Dremio's metadata tables and via INFORMATION_SCHEMA[1][3][4][6][7].
Mappings from External Sources
- SQL Server:
SMALLINT,TINYINT,BIT→INTEGER;BOOL/BIT→BOOLEAN;REAL→FLOAT; no support forTIME WITH TIME ZONEor trueTIMESTAMP WITH TIME ZONE(mapped toTIME/TIMESTAMP)[1]. - PostgreSQL:
BOOL/BIT→BOOLEAN;SMALLINT/TINYINT→INTEGER;REAL→FLOAT;TIMESTAMPTZ/TIMETZ→TIMESTAMP/TIME(zone info stripped)[7]. - Teradata:
SMALLINT/TINYINT/BYTEINT→INTEGER;TIME WITH TIMEZONE→TIME;TIMESTAMP WITH TIMEZONE→TIMESTAMP[6].
Summary Table
| SQL Type | Canonical Dremio Type | Supported/Mapping Notes |
|---|---|---|
| SMALLINT | INTEGER | Mapped |
| TINYINT | INTEGER | Mapped |
| REAL | FLOAT | Canonical name is FLOAT |
| BIT | BOOLEAN | Mapped |
| BOOL | BOOLEAN | Canonical name is BOOLEAN |
| TIME WITH TIME ZONE | TIME | Zone info not stored |
| TIMESTAMP WITH TIME ZONE | TIMESTAMP | Zone info not stored |
If you query INFORMATION_SCHEMA.COLUMNS, these canonical types (INTEGER, FLOAT, BOOLEAN, TIME, TIMESTAMP, etc.) will be returned—not the original SQL type names from the source system[1][3][6][7][9].
If you need to preserve the distinction between, for example, SMALLINT/TINYINT and INTEGER or retain time zone information, Dremio does not natively support this at the type level. You'll need to manage such semantics in your data model or ETL process outside Dremio.
Citations:
- 1: https://docs.dremio.com/25.x/reference/sql/data-types/mappings/microsoft-sql-server/
- 2: https://docs.dremio.com/cloud/reference/sql/information-schema/schemata/
- 3: https://docs.dremio.com/cloud/reference/sql/data-types/
- 4: https://docs.dremio.com/24.3.x/reference/sql/data-types/
- 5: https://docs.dremio.com/current/reference/sql/information-schema/tables/
- 6: https://docs.dremio.com/24.3.x/reference/sql/data-types/mappings/teradata/
- 7: https://docs.dremio.com/cloud/reference/sql/data-types/mappings/postgres/
- 8: https://docs.dremio.com/24.3.x/reference/sql/information-schema/catalogs/
- 9: https://docs.dremio.com/24.3.x/reference/sql/data-types/mappings/hive/
- 10: https://docs.dremio.com/current/reference/sql/data-types/mappings/oracle/
Update Dremio data_type_list to use only canonical INFORMATION_SCHEMA types
Dremio only returns the uppercase canonical type names—it does not emit SMALLINT, TINYINT, REAL, BOOL, BIT, or TZ-qualified types in INFORMATION_SCHEMA.COLUMNS. Tighten the macro to match exactly what Dremio reports:
• File: macros/utils/data_types/data_type_list.sql
• Lines: 155–173
{% macro dremio__data_type_list(data_type) %}
- {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %}
- {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %}
- {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %}
- {% set boolean_list = ['BOOLEAN', 'BIT'] | list %}
+ {% set string_list = ['VARCHAR'] | list %}
+ {% set numeric_list = ['INTEGER','BIGINT','FLOAT','DOUBLE','DECIMAL'] | list %}
+ {% set timestamp_list = ['DATE','TIME','TIMESTAMP'] | list %}
+ {% set boolean_list = ['BOOLEAN'] | list %}
{%- if data_type == 'string' %}
{{ return(string_list) }}
@@ unchanged @@
{% endmacro %}This aligns the lists with what Dremio’s INFORMATION_SCHEMA actually returns.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% macro dremio__data_type_list(data_type) %} | |
| {% set string_list = ['VARCHAR', 'CHARACTER VARYING'] | list %} | |
| {% set numeric_list = ['INT','INTEGER','BIGINT','DOUBLE','DECIMAL','FLOAT','SMALLINT','TINYINT'] | list %} | |
| {% set timestamp_list = ['DATE','TIME','TIMESTAMP', 'TIME WITH TIME ZONE', 'TIMESTAMP WITH TIME ZONE'] | list %} | |
| {% set boolean_list = ['BOOLEAN', 'BIT'] | list %} | |
| {%- if data_type == 'string' %} | |
| {{ return(string_list) }} | |
| {%- elif data_type == 'numeric' %} | |
| {{ return(numeric_list) }} | |
| {%- elif data_type == 'timestamp' %} | |
| {{ return(timestamp_list) }} | |
| {%- elif data_type == "boolean" %} | |
| {{ return(boolean_list) }} | |
| {%- else %} | |
| {{ return([]) }} | |
| {%- endif %} | |
| {% endmacro %} | |
| {% macro dremio__data_type_list(data_type) %} | |
| {% set string_list = ['VARCHAR'] | list %} | |
| {% set numeric_list = ['INTEGER','BIGINT','FLOAT','DOUBLE','DECIMAL'] | list %} | |
| {% set timestamp_list = ['DATE','TIME','TIMESTAMP'] | list %} | |
| {% set boolean_list = ['BOOLEAN'] | list %} | |
| {%- if data_type == 'string' %} | |
| {{ return(string_list) }} | |
| {%- elif data_type == 'numeric' %} | |
| {{ return(numeric_list) }} | |
| {%- elif data_type == 'timestamp' %} | |
| {{ return(timestamp_list) }} | |
| {%- elif data_type == "boolean" %} | |
| {{ return(boolean_list) }} | |
| {%- else %} | |
| {{ return([]) }} | |
| {%- endif %} | |
| {% endmacro %} |
Summary by CodeRabbit